18 research outputs found
Recurrent Relational Networks
This paper is concerned with learning to solve tasks that require a chain of
interdependent steps of relational inference, like answering complex questions
about the relationships between objects, or solving puzzles where the smaller
elements of a solution mutually constrain each other. We introduce the
recurrent relational network, a general purpose module that operates on a graph
representation of objects. As a generalization of Santoro et al. [2017]'s
relational network, it can augment any neural network model with the capacity
to do many-step relational reasoning. We achieve state of the art results on
the bAbI textual question-answering dataset with the recurrent relational
network, consistently solving 20/20 tasks. As bAbI is not particularly
challenging from a relational reasoning point of view, we introduce
Pretty-CLEVR, a new diagnostic dataset for relational reasoning. In the
Pretty-CLEVR set-up, we can vary the question to control for the number of
relational reasoning steps that are required to obtain the answer. Using
Pretty-CLEVR, we probe the limitations of multi-layer perceptrons, relational
and recurrent relational networks. Finally, we show how recurrent relational
networks can learn to solve Sudoku puzzles from supervised training data, a
challenging task requiring upwards of 64 steps of relational reasoning. We
achieve state-of-the-art results amongst comparable methods by solving 96.6% of
the hardest Sudoku puzzles.Comment: Accepted at NIPS 201
CloudScan - A configuration-free invoice analysis system using recurrent neural networks
We present CloudScan; an invoice analysis system that requires zero
configuration or upfront annotation. In contrast to previous work, CloudScan
does not rely on templates of invoice layout, instead it learns a single global
model of invoices that naturally generalizes to unseen invoice layouts. The
model is trained using data automatically extracted from end-user provided
feedback. This automatic training data extraction removes the requirement for
users to annotate the data precisely. We describe a recurrent neural network
model that can capture long range context and compare it to a baseline logistic
regression model corresponding to the current CloudScan production system. We
train and evaluate the system on 8 important fields using a dataset of 326,471
invoices. The recurrent neural network and baseline model achieve 0.891 and
0.887 average F1 scores respectively on seen invoice layouts. For the harder
task of unseen invoice layouts, the recurrent neural network model outperforms
the baseline with 0.840 average F1 compared to 0.788.Comment: Presented at ICDAR 201
Attend, Copy, Parse -- End-to-end information extraction from documents
Document information extraction tasks performed by humans create data
consisting of a PDF or document image input, and extracted string outputs. This
end-to-end data is naturally consumed and produced when performing the task
because it is valuable in and of itself. It is naturally available, at no
additional cost. Unfortunately, state-of-the-art word classification methods
for information extraction cannot use this data, instead requiring word-level
labels which are expensive to create and consequently not available for many
real life tasks. In this paper we propose the Attend, Copy, Parse architecture,
a deep neural network model that can be trained directly on end-to-end data,
bypassing the need for word-level labels. We evaluate the proposed architecture
on a large diverse set of invoices, and outperform a state-of-the-art
production system based on word classification. We believe our proposed
architecture can be used on many real life information extraction tasks where
word classification cannot be used due to a lack of the required word-level
labels
End-to-end information extraction without token-level supervision
Most state-of-the-art information extraction approaches rely on token-level labels to find the areas of interest in text. Unfortunately, these labels are time-consuming and costly to create, and consequently, not available for many real-life IE tasks. To make matters worse, token-level labels are usually not the desired output, but just an intermediary step. End-to-end (E2E) models, which take raw text as input and produce the desired output directly, need not depend on token-level labels. We propose an E2E model based on pointer networks, which can be trained directly on pairs of raw input and output text. We evaluate our model on the ATIS data set, MIT restaurant corpus and the MIT movie corpus and compare to neural baselines that do use token-level labels. We achieve competitive results, within a few percentage points of the baselines, showing the feasibility of E2E information extraction without the need for token-level labels. This opens up new possibilities, as for many tasks currently addressed by human extractors, raw input and output data are available, but not token-level labels
Finding Game Levels with the Right Difficulty in a Few Trials through Intelligent Trial-and-Error
Methods for dynamic difficulty adjustment allow games to be tailored to
particular players to maximize their engagement. However, current methods often
only modify a limited set of game features such as the difficulty of the
opponents, or the availability of resources. Other approaches, such as
experience-driven Procedural Content Generation (PCG), can generate complete
levels with desired properties such as levels that are neither too hard nor too
easy, but require many iterations. This paper presents a method that can
generate and search for complete levels with a specific target difficulty in
only a few trials. This advance is enabled by through an Intelligent
Trial-and-Error algorithm, originally developed to allow robots to adapt
quickly. Our algorithm first creates a large variety of different levels that
vary across predefined dimensions such as leniency or map coverage. The
performance of an AI playing agent on these maps gives a proxy for how
difficult the level would be for another AI agent (e.g. one that employs Monte
Carlo Tree Search instead of Greedy Tree Search); using this information, a
Bayesian Optimization procedure is deployed, updating the difficulty of the
prior map to reflect the ability of the agent. The approach can reliably find
levels with a specific target difficulty for a variety of planning agents in
only a few trials, while maintaining an understanding of their skill landscape.Comment: To be presented in the Conference on Games 202
EvoCraft: A New Challenge for Open-Endedness
This paper introduces EvoCraft, a framework for Minecraft designed to study
open-ended algorithms. We introduce an API that provides an open-source Python
interface for communicating with Minecraft to place and track blocks. In
contrast to previous work in Minecraft that focused on learning to play the
game, the grand challenge we pose here is to automatically search for
increasingly complex artifacts in an open-ended fashion. Compared to other
environments used to study open-endedness, Minecraft allows the construction of
almost any kind of structure, including actuated machines with circuits and
mechanical components. We present initial baseline results in evolving simple
Minecraft creations through both interactive and automated evolution. While
evolution succeeds when tasked to grow a structure towards a specific target,
it is unable to find a solution when rewarded for creating a simple machine
that moves. Thus, EvoCraft offers a challenging new environment for automated
search methods (such as evolution) to find complex artifacts that we hope will
spur the development of more open-ended algorithms. A Python implementation of
the EvoCraft framework is available at:
https://github.com/real-itu/Evocraft-py